computational complexity
e150e6d0a1e5214740c39c6e4503ba7a-Supplemental-Conference.pdf
Appendix382 AAdditional Experiments3383 A.1 Experiments on the ETT datasets384 In the main body, we present a comparison of the benchmark methods on the ETTm2 dataset. In this385 section, we extend our analysis to the remaining three ETT datasets, namely ETTh1, ETTh2, and386 ETTm1, as summarized in Table 7. Our experimental results reveal that Basisformer outperforms all387 other methods in terms of MSE and MAE. In all experiments, lower MSE values indicate better model performance, and we present the best results in boldface. Experimental results with longer length input setting391 Throughout our research, we maintain consistency in our experimental settings by fixing the input392 length to be 96(with a reduced input length of 36for the illness dataset), instead of using a longer393 length.
Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. We derive an instance-specific regret lower bound and characterize the minimal expected number of times each global action should be explored. This bound and the corresponding optimal exploration process are obtained by solving a combinatorial optimization problem whose set of variables and constraints exponentially grow with the number of agents, and cannot be exploited in the design of efficient algorithms. Inspired by Mean Field approximation techniques used in graphical models, we provide simple upper bounds of the regret lower bound. The corresponding optimization problems have a reduced number of variables and constraints. By tuning the latter, we may explore the trade-off between the achievable regret and the complexity of computing the corresponding exploration process. We devise Efficient Sampling for MAMAB (ESM), an algorithm whose regret asymptotically matches the approximated lower bounds. The regret and computational complexity of ESM are assessed numerically, using both synthetic and real-world experiments in radio communications networks.
in Fixed Dimension Training Neural Networks is NP-Hard
Our results settle the complexity status regarding these parameters number of dimensions and number of ReLUs if the network is assumed to compute the ReLU case, we show fixed-parameter tractability for the combined parameter four ReLUs (or two linear threshold neurons) with zero training error. Finally, in We also answer a question by Froese et al. [2022, JAIR] proving W[1]-hardness for dimensions, which excludes any polynomial-time algorithm for constant dimension. Khalife and Basu [2022, IPCO] showing that both problems are NP-hard for two eral questions are still open. We answer questions by Arora et al. [2018, ICLR] and complexity of these problems has been studied numerous times in recent years, sevsidering ReLU and linear threshold activation functions.
Mixability made efficient: Fast online multiclass logistic regression
Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of O(log(Bn)) whereas Online Newton Step achieves O(eBlog(n)) obtaining a double exponential gain in B (a bound on the norm of comparative functions). However, this high statistical performance is at the price of a prohibitive computational complexity O(n37). In this paper, we use quadratic surrogates to make aggregating forecasters more efficient. We show that the resulting algorithm has still high statistical performance for a large class of losses. In particular, we derive an algorithm for multi-class logistic regression with a regret bounded by O(Blog(n)) and a computational complexity of only O(n4).
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully devised yet simple spatial attention mechanism performs favorably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins-PCPVT and TwinsSVT. Our proposed architectures are highly efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks. Our code is available at: https://git.io/Twins.
Generalizable Multi-Linear Attention Network
The majority of existing multimodal sequential learning methods focus on how to obtain powerful individual representations and neglect to effectively capture the multimodal joint representation. Bilinear attention network (BAN) is a commonly used integration method, which leverages tensor operations to associate the features of different modalities. However, BAN has a poor compatibility for more modalities, since the computational complexity of the attention map increases exponentially with the number of modalities. Based on this concern, we propose a new method called generalizable multi-linear attention network (MAN), which can associate more modalities in acceptable complexity with hierarchical approximation decomposition. Specifically, considering the fact that softmax attention kernels cannot be decomposed as linear operation directly, we adopt the addition random features mechanism to approximate the non-linear softmax functions with enough theoretical analysis. Furthermore, we also introduce the local sequential constraints, which can be combined with ARF conveniently, as positional information. We conduct extensive experiments on several datasets of corresponding tasks, the experimental results show that MAN could achieve competitive results compared with baseline methods, showcasing the effectiveness of our contributions.
Supplementary Material: Memory-Efficient Approximation Algorithms for MAX-K-CUT and Correlation Clustering
Let ϑ Rd1 and µ Rd2 be the dual variables corresponding to the d1 equality constraints and the d2 inequality constraints respectively. Let X? be an optimal solution to (SDP) and let X?FW be an optimal solution to (SDP-LSE). For ease of notation, let u= A(1)(X) b(1) andv = b(2) A(2)(X), (1) and define (bu,bv), (uFW,vFW) and (u?,v?) by substituting bX, XFW and X? respectively in (1). Upper bound on the objective. Rearranging the terms, using the duality of the `1 and ` norms, and the fact that µ? 0, gives hC, bX i hC,X?i+